Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models

نویسندگان

  • ShengJun Cheng
  • Jiafeng Liu
  • XiangLong Tang
چکیده

This paper shows how to use labeled and unlabeled data to improve inductive models with the help of transductive models. We proposed a solution for the self-training scenario. Selftraining is an effective semi-supervised wrapper method which can generalize any type of supervised inductive model to the semi-supervised settings. it iteratively refines a inductive model by bootstrap from unlabeled data. Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare. As a result, it could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance. To tackle this problem, we propose a novel self-training style algorithm which incorporate a graph-based transductive model in the self-labeling process. Unlike standard self-training, our algorithm utilizes labeled and unlabeled data as a whole to label and select unlabeled examples for training set augmentation. A robust transductive model based on graph markov random walk is proposed, which exploits manifold assumption to output reliable predictions on unlabeled data using noisy labeled examples. The proposed algorithm can greatly minimize the risk of performance degradation due to accumulated noise in the training set. Experiments show that the proposed algorithm can effectively utilize unlabeled data to improve classification performance. Keywords—Inductive model, Transductive model, Semisupervised learning, Markov random walk.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transductive Learning of Structural SVMs via Prior Knowledge Constraints

Reducing the number of labeled examples required to learn accurate prediction models is an important problem in structured output prediction. In this paper we propose a new transductive structural SVM algorithm that learns by incorporating prior knowledge constraints on unlabeled data. Our formulation supports different types of prior knowledge constraints, and can be trained efficiently. Exper...

متن کامل

Multi-Label Classification with Unlabeled Data: An Inductive Approach

The problem of multi-label classification has attracted great interests in the last decade. Multi-label classification refers to the problems where an example that is represented by a single instance can be assigned tomore than one category. Until now, most of the researches on multi-label classification have focused on supervised settings whose assumption is that large amount of labeled traini...

متن کامل

A relational approach to probabilistic classification in a transductive setting

Transduction is an inference mechanism adopted from several classification algorithms capable of exploiting both labeled and unlabeled data and making the prediction for the given set of unlabeled data only. Several transductive learning methods have been proposed in the literature to learn transductive classifiers from examples represented as rows of a classical double-entry table (or relation...

متن کامل

Feature Selection for Classification using Transductive Support Vector Machines

Given unlabeled data in advance, transductive feature selection (TFS) is to maximize the classification accuracy on these particular unlabeled data by selecting a small set of relevant and less redundant features. Specifically, this paper introduces the use of Transductive Support Vector Machines(TSVMs) for feature selection. We study three inductive SVM-related feature selection methods: corre...

متن کامل

Transductive Learning from Relational Data

Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014